241 research outputs found

    Incremental Genetic K-means Algorithm and its Application in Gene Expression Data Analysis

    Get PDF
    Background In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. Results In this paper, we propose a new clustering algorithm, Incremental Genetic K-means Algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at http://database.cs.wayne.edu/proj/FGKA/index.htm. Conclusions Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster

    Cloud Computing and Grid Computing 360-Degree Compared

    Full text link
    Cloud Computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for Cloud Computing and there seems to be no consensus on what a Cloud is. On the other hand, Cloud Computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established Grid Computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast Cloud Computing with Grid Computing from various angles and give insights into the essential characteristics of both.Comment: IEEE Grid Computing Environments (GCE08) 200

    ImageSpace: An Environment for Image Ontology Management

    Get PDF
    More and more researchers have realized that ontologies will play a critical role in the development of the Semantic Web, the next generation Web in which content is not only consumable by humans, but also by software agents. The development of tools to support ontology management including creation, visualization, annotation, database storage, and retrieval is thus extremely important. We have developed ImageSpace, an image ontology creation and annotation tool that features (1) full support for the standard web ontology language DAML+OIL; (2) image ontology creation, visualization, image annotation and display in one integrated framework; (3) ontology consistency assurance; and (4) storing ontologies and annotations in relational databases. It is expected that the availability of such a tool will greatly facilitate the creation of image repositories as islands of the Semantic Web

    FM-test: a fuzzy-set-theory-based approach to differential gene expression data analysis

    Get PDF
    Abstract Background Microarray techniques have revolutionized genomic research by making it possible to monitor the expression of thousands of genes in parallel. As the amount of microarray data being produced is increasing at an exponential rate, there is a great demand for efficient and effective expression data analysis tools. Comparison of gene expression profiles of patients against those of normal counterpart people will enhance our understanding of a disease and identify leads for therapeutic intervention. Results In this paper, we propose an innovative approach, fuzzy membership test (FM-test), based on fuzzy set theory to identify disease associated genes from microarray gene expression profiles. A new concept of FM d-value is defined to quantify the divergence of two sets of values. We further analyze the asymptotic property of FM-test, and then establish the relationship between FM d-value and p-value. We applied FM-test to a diabetes expression dataset and a lung cancer expression dataset, respectively. Within the 10 significant genes identified in diabetes dataset, six of them have been confirmed to be associated with diabetes in the literature and one has been suggested by other researchers. Within the 10 significantly overexpressed genes identified in lung cancer data, most (eight) of them have been confirmed by the literatures which are related to the lung cancer. Conclusion Our experiments on synthetic datasets show that FM-test is effective and robust. The results in diabetes and lung cancer datasets validated the effectiveness of FM-test. FM-test is implemented as a Web-based application and is available for free at http://database.cs.wayne.edu/bioinformatics
    • …
    corecore